139 research outputs found
Vision-Based Navigation III: Pose and Motion from Omnidirectional Optical Flow and a Digital Terrain Map
An algorithm for pose and motion estimation using corresponding features in
omnidirectional images and a digital terrain map is proposed. In previous
paper, such algorithm for regular camera was considered. Using a Digital
Terrain (or Digital Elevation) Map (DTM/DEM) as a global reference enables
recovering the absolute position and orientation of the camera. In order to do
this, the DTM is used to formulate a constraint between corresponding features
in two consecutive frames. In this paper, these constraints are extended to
handle non-central projection, as is the case with many omnidirectional
systems. The utilization of omnidirectional data is shown to improve the
robustness and accuracy of the navigation algorithm. The feasibility of this
algorithm is established through lab experimentation with two kinds of
omnidirectional acquisition systems. The first one is polydioptric cameras
while the second is catadioptric camera.Comment: 6 pages, 9 figure
Localization and Positioning Using Combinations of Model Views
A method for localization and positioning in an indoor environment is presented. The method is based on representing the scene as a set of 2D views and predicting the appearances of novel views by linear combinations of the model views. The method is accurate under weak perspective projection. Analysis of this projection as well as experimental results demonstrate that in many cases it is sufficient to accurately describe the scene. When weak perspective approximation is invalid, an iterative solution to account for the perspective distortions can be employed. A simple algorithm for repositioning, the task of returning to a previously visited position defined by a single view, is derived from this method
Semantic Parsing of Colonoscopy Videos with Multi-Label Temporal Networks
Following the successful debut of polyp detection and characterization, more
advanced automation tools are being developed for colonoscopy. The new
automation tasks, such as quality metrics or report generation, require
understanding of the procedure flow that includes activities, events,
anatomical landmarks, etc. In this work we present a method for automatic
semantic parsing of colonoscopy videos. The method uses a novel DL multi-label
temporal segmentation model trained in supervised and unsupervised regimes. We
evaluate the accuracy of the method on a test set of over 300 annotated
colonoscopy videos, and use ablation to explore the relative importance of
various method's components
3D Human Body-Part Tracking and Action Classification Using a Hierarchical Body Model
This paper presents a framework for hierarchical 3D articulated human body-part tracking and action classification. We introduce a Hierarchical Annealing Particle Filter (H-APF) algorithm, which applies nonlinear dimensionality reduction of the high di-mensional data space to the low dimensional latent spaces combined with the dynamic motion model and the Hierarchical Human Body Model. The improved annealing ap-proach is used for the propagation between different body models and sequential frames. The tracking algorithm generates trajectories in the latent spaces, which provide low di-mensional representations of body poses, observed during the motion. These trajectories are used to classify human motions. The tracking and classification algorithms were checked on HumanEvaI, HumanEvaII, and other datasets, involving more complicated motion types and transitions and proved to be effective and robust. The comparison to other methods and the error calculations are provided.
Visual Tracking by Affine Kernel Fitting Using Color and Object Boundary
Kernel-based trackers aggregate image features within the support of a kernel (a mask) regardless of their spatial structure. These trackers spatially fit the kernel (usually in location and in scale) such that a function of the aggregate is optimized. We propose a kernel-based visual tracker that exploits the constancy of color and the presence of color edges along the target boundary. The tracker estimates the best affinity of a spatially aligned pair of kernels, one of which is color-related and the other of which is object boundary-related. In a sense, this work extends previous kernel-based track-ers by incorporating the object boundary cue into the track-ing process and by allowing the kernels to be affinely trans-formed instead of only translated and isotropically scaled. These two extensions make for more precise target local-ization. Moreover, a more accurately localized target facil-itates safer updating of its reference color model, further enhancing the tracker’s robustness. The improved tracking is demonstrated for several challenging image sequences. 1
Weakly-Supervised Surgical Phase Recognition
A key element of computer-assisted surgery systems is phase recognition of
surgical videos. Existing phase recognition algorithms require frame-wise
annotation of a large number of videos, which is time and money consuming. In
this work we join concepts of graph segmentation with self-supervised learning
to derive a random-walk solution for per-frame phase prediction. Furthermore,
we utilize within our method two forms of weak supervision: sparse timestamps
or few-shot learning. The proposed algorithm enjoys low complexity and can
operate in lowdata regimes. We validate our method by running experiments with
the public Cholec80 dataset of laparoscopic cholecystectomy videos,
demonstrating promising performance in multiple setups
Recognition by Functional Parts
(Also cross-referenced as CAR-TR-703)
We present an approach to function-based object recognition that
reasons about the functionality of an object's intuitive parts. We extend
the popular "recognition by parts" shape recognition framework to support
"recognition by functional parts", by com bining a set of functional
primitives and their relations with a set of abstract volumetric shape
primitives and their relations. Previous approaches have relied on more
global object features, often ignoring the problem of object segmentation
and thereby restricting themselves to range images of unoccluded scenes.
We show how these shape primitives and relations can be easily recovered
from superquadric ellipsoids which, in turn, can be recovered from
either range or intensity images of occluded scenes. Furthermore, the
proposed framework supports both unexpected (bottom-up) object
recognition and expected (top-down) object recognition. We demonstrate
the approach on a simple domain by recognizing a restricted class of
hand-tools from 2-D images
Clinical BERTScore: An Improved Measure of Automatic Speech Recognition Performance in Clinical Settings
Automatic Speech Recognition (ASR) in medical contexts has the potential to
save time, cut costs, increase report accuracy, and reduce physician burnout.
However, the healthcare industry has been slower to adopt this technology, in
part due to the importance of avoiding medically-relevant transcription
mistakes. In this work, we present the Clinical BERTScore (CBERTScore), an ASR
metric that penalizes clinically-relevant mistakes more than others. We
demonstrate that this metric more closely aligns with clinician preferences on
medical sentences as compared to other metrics (WER, BLUE, METEOR, etc),
sometimes by wide margins. We collect a benchmark of 13 clinician preferences
on 149 realistic medical sentences called the Clinician Transcript Preference
benchmark (CTP), demonstrate that CBERTScore more closely matches what
clinicians prefer, and release the benchmark for the community to further
develop clinically-aware ASR metrics
Blind decomposition of transmission light microscopic hyperspectral cube using sparse representation
Abstract-In this paper, we address the problem of fully automated decomposition of hyperspectral images for transmission light microscopy. The hyperspectral images are decomposed into spectrally homogeneous compounds. The resulting compounds are described by their spectral characteristics and optical density. We present the multiplicative physical model of image formation in transmission light microscopy, justify reduction of a hyperspectral image decomposition problem to a blind source separation problem, and provide method for hyperspectral restoration of separated compounds. In our approach, dimensionality reduction using principal component analysis (PCA) is followed by a blind source separation (BSS) algorithm. The BSS method is based on sparsifying transformation of observed images and relative Newton optimization procedure. The presented method was verified on hyperspectral images of biological tissues. The method was compared to the existing approach based on nonnegative matrix factorization. Experiments showed that the presented method is faster and better separates the biological compounds from imaging artifacts. The results obtained in this work may be used for improving automatic microscope hardware calibration and computer-aided diagnostics
- …